Efficient Development of Parallel NLP Applications
نویسندگان
چکیده
Parallel programming is becoming increasingly popular. Computers have increasingly many cores (processors). Also, large computer-clusters are becoming available. But there is still no good programming framework for these architectures, and thus no simple and unified way for NLP applications to take advantage of the potential speed up. In this paper, we develop a broadly applicable parallel programming method to NLP problems. Our work is in distinct contrast to the tradition of designing (often ingenious) ways to speed up a single algorithm at a time. Specifically, we show how the problems which can be expressed in LBJ framework [13] take advantage of parallelization. We use Charm++ [7] to demonstrate the speed up of NLP applications.
منابع مشابه
TectoMT: Modular NLP Framework
In the present paper we describe TectoMT, a multi-purpose open-source NLP framework. It allows for fast and efficient development of NLP applications by exploiting a wide range of software modules already integrated in TectoMT, such as tools for sentence segmentation, tokenization, morphological analysis, POS tagging, shallow and deep syntax parsing, named entity recognition, anaphora resolutio...
متن کاملCreating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion
Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. The Linguistic Data Consortium (LDC) has supported research on statistical machine translations and other NLP applications by creating and distributing a large amount of parallel text resources for the research communities. However, manual translations are v...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملInterlingual Annotation of Parallel Text Corpora: A New Framework for Annotation and Evaluation
This paper focuses on the next step in the creation of a system of meaning representation and the development of semantically-annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to provide parallel corpora annotated with detailed deep ...
متن کاملAn Efficient Parallel Substrate tor Typed Feature Structures on Shared Memory Parallel Machines
This paper describes an efficient parallel system for processing Typed Feature Structures (TFSs) on shared-memory parallel machines. We call the system Parallel Substrate for TFS (PSTFS}. PSTFS is designed for parallel computing environments where a large number of agents are working and communicating with each other. Such agents use PSTFS as their low-level module for solving constraints on TF...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010